Session E-1

Learning and Prediction

Conference
2:00 PM — 3:30 PM EDT
Local
May 11 Tue, 2:00 PM — 3:30 PM EDT

Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms

Guoju Gao and He Huang (Soochow University, China); Mingjun Xiao (University of Science and Technology of China, China); Jie Wu (Temple University, USA); Yu-e Sun (Soochow University, China); Sheng Zhang (Nanjing University, China)

2
The multi-armed bandit (MAB) model has been deeply studied to solve many online learning problems, such as rate allocation in communication networks, Ad recommendation in social networks, etc. In an MAB model, given N arms whose rewards are unknown in advance, the player selects exactly one arm in each round, and his goal is to maximize the cumulative rewards over a fixed horizon. In this paper, we study the budget-constrained auction-based combinatorial multi-armed bandit mechanism with strategic arms, where the player can select K (< N) arms in a round and pulling each arm has a unique cost. In addition, each arm might strategically report its cost in the auction. To this end, we combine the upper confidence bound (UCB) with auction to define the UCB-based rewards and then devise an auction-based UCB algorithm (called AUCB). In each round, AUCB selects the top K arms according to the ratios of UCB-based rewards to bids and further determines the critical payment for each arm. For AUCB, we derive an upper bound on regret and prove the truthfulness, individual rationality, and computational efficiency. Extensive simulations show that the rewards achieved by AUCB are at least 12.49% higher than those of state-of-the-art algorithms.

Bandit Learning with Predicted Context: Regret Analysis and Selective Context Query

Jianyi Yang and Shaolei Ren (University of California, Riverside, USA)

1
Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret's structural decomposition, we propose context query algorithms to selectively obtain outside expert's input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.

Individual Load Forecasting for Multi-Customers with Distribution-aware Temporal Pooling

Eunju Yang and Chan-Hyun Youn (Korea Advanced Institute of Science and Technology, Korea (South))

2
For smart grid services, accurate individual load forecasting is an essential element. When training individual forecasting models for multi-customers, discrepancies in data distribution among customers should be considered; there are two simple ways to build the models considering multi-customers: constructing each model independently or training as one model encompassing multi-customers. The independent approach shows higher accuracy than the latter. However, it deploys copious models, causing resource/management inefficiency; the latter is the opposite. A compromise between these two could be clustering-based forecasting. However, the previous studies are limited in applying to individual forecasting in that they focus on aggregated load and do not consider concept drift, which degrades accuracy over time. Therefore, we propose a distribution-aware temporal pooling framework that is enhanced clustering-based forecasting. For the clustering, we propose Variational Recurrent Deep Embedding (VaRDE) working in a distribution-aware manner, so it is suitable to process individual load. It allocates clusters to customers every time, so the clusters, where customers are assigned, are dynamically changed to resolve distribution change. We conducted experiments with real data for evaluation, and the result showed better performance than previous studies, especially with a few models even for unseen data, leading to high scalability.

DeepLoRa: Learning Accurate Path Loss Model for Long Distance Links in LPWAN

Li Liu, Yuguang Yao, Zhichao Cao and Mi Zhang (Michigan State University, USA)

5
LoRa (Long Range) is an emerging wireless technology that enables long-distance communication and keeps low power consumption. Therefore, LoRa plays a more and more important role in Low-Power Wide-Area Networks (LPWANs), which easily extend many large-scale Internet of Things (IoT) applications in diverse scenarios (e.g., industry, agriculture, city). In lots of environments where various types of land-covers usually exist, it is challenging to precisely predict a LoRa link's path loss. As a result, how to deploy LoRa gateways to ensure reliable coverage and develop precise fingerprint-based localization becomes a difficult issue in practice. In this paper, we propose DeepLoRa, a deep learning-based approach to accurately estimate the path loss of long-distance links in complex environments. Specifically, DeepLoRa relies on remote sensing to automatically recognize land-cover types along a LoRa link. Then, DeepLoRa utilizes Bi-LSTM (Bidirectional Long Short Term Memory) to develop a land-cover aware path loss model. We implement DeepLoRa and use the data gathered from a real LoRaWAN deployment on campus to evaluate its performance extensively in terms of estimation accuracy and model transferability. The results show that DeepLoRa reduces the estimation error to less than 4 dB, which is 2× smaller than state-of-the-art models.

Session Chair

Lan Zhang (University of Science and Technology, China)

Session E-2

RL Applications

Conference
4:00 PM — 5:30 PM EDT
Local
May 11 Tue, 4:00 PM — 5:30 PM EDT

6GAN: IPv6 Multi-Pattern Target Generation via Generative Adversarial Nets with Reinforcement Learning

Tianyu Cui (Institute of Information Engineering, University of Chinese Academy of Sciences, China); Gaopeng Gou (Institute of Information Engineering,Chinese Academy of Sciences, China); Gang Xiong, Chang Liu, Peipei Fu and Zhen Li (Institute of Information Engineering, Chinese Academy of Sciences, China)

0
Global IPv6 scanning has always been a challenge for researchers because of the limited network speed and computational power. Target generation algorithms are recently proposed to overcome the problem for Internet assessments by predicting a candidate set to scan. However, IPv6 custom address configuration emerges diverse addressing patterns discouraging algorithmic inference. Widespread IPv6 alias could also mislead the algorithm to discover aliased regions rather than valid host targets. In this paper, we introduce 6GAN, a novel architecture built with Generative Adversarial Net (GAN) and reinforcement learning for multi-pattern target generation. 6GAN forces multiple generators to train with a multi-class discriminator and an alias detector to generate non-aliased active targets with different addressing pattern types. The rewards from the discriminator and the alias detector help supervise the address sequence decision-making process. After adversarial training, 6GAN's generators could keep a strong imitating ability for each pattern and 6GAN's discriminator obtains outstanding pattern discrimination ability with a 0.966 accuracy. Experiments indicate that our work outperformed the state-of-the-art target generation algorithms by reaching a higher-quality candidate set.

6Hit: A Reinforcement Learning-based Approach to Target Generation for Internet-wide IPv6 Scanning

Bingnan Hou and Zhiping Cai (National University of Defense Technology, China); Kui Wu (University of Victoria, Canada); Jinshu Su (National University of Defence Technology, China); Yinqiao Xiong (National University of Defense Technology, China)

1
Fast Internet-wide network measurement plays an important role in cybersecurity analysis and network asset detection. The vast address space of IPv6, however, makes it infeasible to apply a brute-force approach for scanning the entire network. Even worse, the extremely uneven distribution of IPv6 active addresses results in a low hit rate for active scanning. To address the problem, we propose 6Hit, a reinforcement learning-based target generation method for active address discovery in the IPv6 address space. It first divides the IPv6 address space into different regions according to the structural information of a set of known seed addresses. Then, it allocates exploration resources according to the reward of the scanning on each region. Based on the evaluative feedback from existing scanning results, 6Hit optimizes the subsequent search direction to regions that have a higher density of activity addresses. Compared with other state-of-the-art target generation methods, 6Hit achieves better performance on hit rate. Our experiments over real-world networks show that 6Hit achieves 3.5% - 11.5% hit rate for the eight candidate datasets, which is 7.7% - 630% improvement over the state-of-the-art methods.

Asynchronous Deep Reinforcement Learning for Data-Driven Task Offloading in MEC-Empowered Vehicular Networks

Penglin Dai, Kaiwen Hu, Xiao Wu and Huanlai Xing (Southwest Jiaotong University, China); Zhaofei Yu (Peking University, China)

1
Mobile edge computing (MEC) has been an effective paradigm to support real-time computation-intensive vehicular applications. However, due to highly dynamic vehicular topology, these existing centralized-based or distributed-based scheduling algorithms requiring high communication overhead, are not suitable for task offloading in vehicular networks. Therefore, we investigate a novel service scenario of MEC-based vehicular crowdsourcing, where each MEC server is an independent agent and responsible for making scheduling of processing traffic data sensed by crowdsourcing vehicles. On this basis, we formulate a data-driven task offloading problem by jointly optimizing offloading decision and bandwidth/computation resource allocation, and renting cost of heterogeneous servers, such as powerful vehicles, MEC servers and cloud, which is a mixed-integer programming problem and NP-hard. To reduce high time-complexity, we propose the solution in two stages. First, we design an asynchronous deep Q-learning to determine offloading decision, which achieves fast convergence by training the local DQN model at each agent in parallel and uploading for global model update asynchronously. Second, we decompose the remaining resource allocation problem into several independent subproblems and derive optimal analytic formula based on convex theory. Lastly, we build a simulation model and conduct comprehensive simulation, which demonstrates the superiority of the proposed algorithm.

DeepReserve: Dynamic Edge Server Reservation for Connected Vehicles with Deep Reinforcement Learning

Jiawei Zhang, Suhong Chen, Xudong Wang and Yifei Zhu (Shanghai Jiao Tong University, China)

2
Edge computing is promising to provide computational resources for connected vehicles. Resource demands for edge servers vary due to vehicle mobility. It is then challenging to reserve edge servers to meet variable demands. Existing schemes rely on statistical information of resource demands to determine edge server reservation. They are infeasible in practice, since the reservation based on statistics cannot adapt to time-varying demands. In this paper, a spatio-temporal reinforcement learning scheme called DeepReserve is developed to learn variable demands and then reserve edge servers accordingly. DeepReserve is adapted from the deep deterministic policy gradient algorithm with two major enhancements. First, by observing that the spatio-temporal correlation in vehicle traffic leads to the same property in resource demands of CVs, a convolutional LSTM network is employed to encode resource demands observed by edge servers for inference of future demands. Second, an action amender is designed to make sure an action does not violate spatio-temporal correlation. We also design a new training method, i.e., DR-Train, to stabilize the training procedure. DeepReserve is evaluated via experiments based on real-world datasets. Results show it achieves better performance than state-of-the-art approaches that require accurate demand information.

Session Chair

Xiaowen Gong (Auburn University)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · © 2021 Duetone Corp.